home *** CD-ROM | disk | FTP | other *** search
- DICMERGE.EXE (VERSION 2)
- DICTIONARY MAINTENANCE FOR JWRITE
-
- 1. Introduction.
-
- JWRITE has no built-in functions for the maintenance of the dictionary. This
- is nevertheless a desirable feature to have. If the standard dictionary does
- not contain kanji equivalents of words which you frequently need in your field
- of business, it would be nice to be able to add them to the dictionary.
-
- This IS possible, using the enclosed program DICMERGE.EXE, but I must warn you
- that this is a rather complicated operation. Please follow these instructions
- carefully. It is advisable to copy the dictionary (WNNSJIS.DIC) and its index
- file (WNNSJIS.IND) to somewhere for safekeeping.
-
-
- 2. Dictionary files
- A dictionary file like WNNSJIS.DIC consists of lines of SJIS coded text. Every
- line consists of the following elements, from left to right:
-
- a- the KEYWORD, which must be in ascii (hankaku romaji) or hiragana. Katakana
- keywords are not allowed. From version 2, the keyword is allowed
- to be a "key phrase" which may include spaces. The lines may in general
- not begin with a space; this is considered a "sort violation" and causes
- the program to abort (see below).
- b- one SPACE (don't forget this!)
- c- one RIGHT SLASH (/).
- d- one or more possible TRANSLATIONS for the keyword. The translations may
- be written in any character type, katakana, hiragana, kanji, big or
- small ascii, or special characters.
- Every translation, including the last one, must be followed by a right
- slash. Any text after the last slash will be ignored.
- e- a LINE FEED CHARACTER for signalling the end of the line. It would be
- possible to have carriage return - line feed combinations at the end of
- each line, but in a dictionary containing tens of thousands of lines, that
- would just be tens of thousands of extra bytes.
-
- An example of a dictionary line is:
-
- é½é▒éñ /è±ì`/ï@ì\/ïCî≤/
-
- In other words, the dictionary is just a text file which can be read and
- edited (in principle) by JWRITE itself. In theory, JWRITE could be directly
- used for maintenance work on its own dictionary. Unfortunately, the size of the
- dictionary file makes this impossible in practice. The dictionary file cannot
- possibly fit in memory. (You can try loading it with JWRITE WNNSJIS.DIC, to see
- the beginning of the dictionary, but please do not actually try to change
- anything. You can also view, but not edit, the entire dictionary if you use
- Vernon Buergs LIST.COM, version 7.5i. Use the /B switch to let LIST run under
- KDPLUS. If you use the KDPLUS keyboard input utility KJIN, you can even
- look up words in the dictionary using LIST.)
-
- However, you can add new information to the dictionary by making a small
- dictionary for yourself, containing the information that you want to add, and
- merging it with the existing dictionary. This can be done with the program
- DICMERGE.
-
- Notice that it is only possible to ADD to the dictionary this way. You cannot
- remove anything from it (not with this utility, anyway. Utilities for removing
- dictionary lines can, however, be made. They should overwrite the un-needed
- lines with spaces; a DICMERGE operation on the file will then re-create a
- valid index).
-
- 3. Making an update dictionary
- You make your update dictionary as a text file, using the rules a-d specified
- above (rule e is not important for the update file, because DICMERGE will
- convert CR/LF combinations to single LF's). You can add completely new
- keywords with their translations, and also new translations for existing
- keywords. For instance, the present version of the WNNSJIS.DIC has only one
- translation for the keyword é┐éπéñéó, namely Æìê╙ . Now imagine that you often
- need military terms, and you want to have the word Æåê╤ (also pronounced
- é┐éπéñéó) in the dictionary as well. Your update text (call it, for instance,
- PRIVATE.DIC) must then contain the line
-
- é┐éπéñéó /Æåê╤/
-
- (Because Æåê╤ is not in the dictionary yet, you must construct the word from
- the separate kanji Æå and ê╤, which are in the dictionary as single kanji,
- to be found through their pronunciations é┐éπéñ and éó).
-
- Your update text may contain many such lines. It is IMPORTANT (in fact this is
- the most important and the most difficult bit of the whole operation) that the
- file be SORTED:
-
- -the lines with alphabetical keyword must come before the lines
- with hiragana keyword
- -the lines with alphabetical keyword must be in alphabetical order (in fact,
- in standard ASCII order. Look at any ASCII table).
- -the lines with hiragana keyword must be in Japanese kana order
- (éá-éó-éñ-éª-é¿-é⌐-é½-é¡-é»....etc.)
-
- When you have finished, save the file. If you're not sure about the sorting,
- use the DOS SORT utility.
-
-
- 4. Merging with the existing dictionary
-
- Now go to DOS and type
-
- DICMERGE
-
- Type in the names of the 2 dictionaries: file 1 is PRIVATE.DIC, file 2 is the
- existing dictionary, WNNSJIS.DIC. The new (merged) dictionary will be called
- MERGE.DIC; at the same time an index file will be made for it, MERGE.IND.
- MERGE.DIC will separate lines by line feeds only, no carriage returns.
-
- If you type in only one dictionary name, and just press ENTER when asked
- for the other one, DICMERGE will still run; its only function will then
- be to re-create the index for the one dictionary that you specified. (You
- might need this if the index file has become lost or corrupted).
-
- The merge process will take some time (a minute or so, for a large dictionary).
- You can follow its progress on your screen, as new entries are constructed for
- the index (this works best when you are in the KDPLUS environment).
-
- Lines which are illegally-formed (e.g. lines with only one slash in them, or
- lines which begin with a katakana or a kanji) are discarded. The program will
- warn you, but continue with the merge process.
-
- However, lines which are otherwise legally-formed but are not in proper sorted
- order will cause the program to abort, displaying the location of the "sort
- violation". You can then try to correct the situation before proceeding to the
- next step. If the program halts for that reason, there will be no MERGE.DIC
- and MERGE.IND files generated (for your protection).
-
- When the merge process is finished, you must enter the following commands:
-
- del wnnsjis.* (I hope you saved the old version somewhere..)
- ren merge.* wnnsjis.*
-
- From that moment, the new keywords and new meanings have been added to the
- dictionary, and are accessible by means of the ALT-L function of JWRITE. If
- there were "merged" entries (the same keyword occurring in both input
- dictionaries but with different translations) the translations from the
- first input dictionary will be listed first on the corresponding line of the
- output dictionary.
-
- 5. A tip.
- Here and there in public domain sources you can find dictionary files. If
- you are sure that they conform to the rules mentioned in section 2, you can
- merge them with your existing dictionary to increase its capabilities. There
- will be a penalty: the bigger the dictionary, the slower it will be in general.
- Test it with a "slow word", like ÉVò╖ (the position of this word in the list
- is such that looking for it will take some time, more than a second).
-
-
- 6. New features in version 2.
- It is now possible to do a DICMERGE on only one dictionary (just press ENTER
- when asked for the other one). This will re-create the index.
-
- The program performs more stringent checking on the dictionary lines, reducing
- the chances of destroying your dictionary by merging it with a file containing
- illegal lines.
-
- It is now possible (in principle) to delete lines from the dictionary by
- overwriting them with spaces.
-
- "Key phrases" (with spaces in them) are now allowed. However, the lookup
- mechanism of JWRITE will not recognize them unless your version of JWRITE
- is 1.5 or higher.
-
- Katakana keywords are now detected and discarded. In the previous version
- it seemed that you could get away with including katakana keywords (if you
- put them at the end of the update dictionary), but in fact each katakana entry
- would make some alphabetic entries inaccessible by destroying the index
- pointers to them.
-
-
- 7. Note.
- A file NUMBERS.DIC, with which you can extend the dictionary with novel
- number symbols like çD and ç[, has been provided in this archive for
- test purposes.
-
- Tokyo, 5 January 1992 (Revised 3 March 1992)
- Jan W. Stumpel
-
-
-